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Abstract.  Advanced  polymorphic  type  systems  have  come  to  play  an 
important  role  in  the  world  of  functional  programming.  But,  curiously, 
these  type  systems  have  so  far  had  little  impact  upon  widely-used  imper¬ 
ative  programming  languages  like  C  and  C-| — |-.  We  show  that  ML-style 
polymorphism  can  be  integrated  smoothly  into  a  dialect  of  C,  which  we 
call  Polymorphic  C.  It  has  the  same  pointer  operations  as  C,  includ¬ 
ing  the  address-of  operator  &,  the  dereferencing  operator  *,  and  pointer 
arithmetic.  Our  type  system  allows  these  operations  in  their  full  gen¬ 
erality,  so  that  programmers  need  not  give  up  the  flexibility  of  C  to 
gain  the  benefits  of  ML-style  polymorphism.  We  prove  a  type  soundness 
theorem  that  gives  a  rigorous  and  useful  characterization  of  well-typed 
Polymorphic  C  programs  in  terms  of  what  can  go  wrong  when  they  are 
evaluated. 


1  Introduction 

Much  attention  has  been  given  to  developing  sound  polymorphic  type  systems  for 
languages  with  imperative  features.  Most  notable  is  the  large  body  of  work  sur¬ 
rounding  ML  [GMW79,  Tof90,  LeW91,  SML93,  Wri95,  VoS95],  However,  none  of 
these  efforts  addresses  the  polymorphic  typing  of  variables,  arrays  and  pointers 
(first-class  references),  which  are  essential  ingredients  of  any  traditional  imper¬ 
ative  language.  As  a  result,  they  cannot  be  directly  applied  to  get  ML-style 
polymorphic  extensions  of  widely-used  languages  like  C  and  C+- K 

This  paper  presents  a  provably-sound  type  system  for  a  polymorphic  dialect 
of  C,  called  Polymorphic  C.  It  has  the  same  pointer  operations  as  C,  including 
the  address-of  operator  &,  the  dereferencing  operator  *,  and  pointer  arithmetic. 
The  type  system  allows  these  operations  without  any  restrictions  on  them  so 
that  programmers  can  enjoy  C’s  pointer  flexibility  and  yet  have  type  security 

*  To  be  presented  at  the  1996  European  Symposium  on  Programming,  Linkoping  Swe¬ 
den,  22-24  April  1996.  This  material  is  based  upon  activities  supported  by  the  Na¬ 
tional  Science  Foundation  under  Agreements  No.  CCR-9414421  and  CCR-9400592. 
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and  polymorphism  as  in  ML.  Our  type  system  demonstrates  that  ML-style  poly¬ 
morphism  can  be  brought  cleanly  and  elegantly  into  the  realm  of  traditional 
imperative  languages. 

We  establish  a  type  soundness  theorem  that  gives  a  rigorous  and  useful  char¬ 
acterization  of  well-typed  Polymorphic  C  programs  in  terms  of  what  can  go 
wrong  when  they  are  evaluated.  Our  approach  uses  a  natural-style  semantics 
and  a  formulation  of  subject  reduction  based  on  Harper’s  syntactic  approach 
[Har94].  It  is  simple  and  does  not  require  a  separate  type  semantics.  We  ex¬ 
pect  it  to  be  useful  in  proving  type  soundness  for  a  wide  variety  of  imperative 
languages  having  first-class  pointers  and  mutable  variables  and  arrays. 

We  begin  with  an  overview  of  Polymorphic  C  in  the  next  section.  Then  we 
formally  describe  its  syntax,  type  system,  and  semantics.  Then,  in  Section  4  we 
establish  the  soundness  of  the  type  system. 

2  An  Overview  of  Polymorphic  C 

Polymorphic  C  is  intended  to  be  as  close  to  the  core  of  Kernighan  and  Ritchie  C 
[KR78]  as  possible.  In  particular,  it  is  stack-based  with  variables,  pointers,  and 
arrays.  Pointers  are  dereferenced  explicitly  using  *,  while  variables  are  derefer¬ 
enced  implicitly.  Furthermore,  pointers  are  first-class  values,  but  variables  are 
not.  Polymorphic  C  has  the  same  pointer  operations  as  C.  A  well-typed  Poly¬ 
morphic  C  program  in  our  system  may  still  suffer  from  dangling  reference  and 
illegal  address  errors.  Our  focus  has  not  been  on  eliminating  such  pointer  in¬ 
securities,  which  would  require  weakening  C’s  expressive  power,  but  rather  on 
adding  ML-style  polymorphism  to  C,  so  that  programmers  can  write  polymor¬ 
phic  functions  naturally  and  soundly  as  they  would  in  Standard  ML,  rather  than 
by  parameterizing  functions  on  data  sizes  or  by  using  pointers  of  type  void  *. 

Syntactically,  Polymorphic  C  uses  a  flexible  syntax  similar  to  that  of  core-ML 
of  Damas  and  Milner  [DaM82].  For  example,  here  is  a  Polymorphic  C  function 
that  reverses  the  elements  of  an  array: 

let  swap  =  Ax,  y.  letvar  t  :=  *x  in  *x  :=  *t/;  *y  :=  t 

in 

let  reverse  =  A  a,  n.  letvar  i  :=  0  in 
while  i  <  n  —  1  —  i  do 

swap(a  +  i,  a  +  n  —  1  —  i); 
i  :=  i  +  1 

in .  .  . 

The  construct  letvar  x  :=  e\  in  e 2  binds  x  to  a  new  cell  initialized  to  the  value 
of  ei ;  the  scope  of  the  binding  is  e 2  and  the  lifetime  of  the  cell  ends  after  e 2 
is  evaluated.  Variable  x  is  dereferenced  implicitly.  This  is  achieved  via  a  typing 
rule  that  says  that  if  e  has  type  r  var ,  then  it  also  has  type  r. 

As  in  C,  the  call  to  swap  in  reverse  could  equivalently  be  written  as 


swap(&ia[i\,  <ka[n  —  1  —  i]) 


and  also  as  in  C,  array  subscripting  is  syntactic  sugar:  ei[e2]  is  equivalent  to 
*(ei  +  e2).  Arrays  themselves  are  introduced  by  the  construct  letarr  x[ef\  in  62, 
which  binds  x  to  a  pointer  to  an  uninitialized  array  whose  size  is  the  value  of 
ei ;  the  scope  of  x  is  62,  and  the  lifetime  of  the  array  ends  after  e 2  is  evaluated. 

The  type  system  of  Polymorphic  C  assigns  types  of  the  form  r  var  to  vari¬ 
ables,  and  types  of  the  form  r  ptr  to  pointers.3  Functions  swap  and  reverse  given 
above  are  polymorphic;  swap  has  type 

Va  .  a  pir  x  a  pir  —>■  a 


while  reverse  has  type 

Va  .  a  pir  x  mi  —>■  umi 

Notice  that  pointer  and  array  types  are  unified  as  in  C.  Also,  variable  and  pointer 
types  are  related  by  symmetric  typing  rules  for  &  and  *:  if  e  :  r  var,  then  &e  : 
r  ptr,  and  if  e  :  r  ptr,  then  *e  :  r  var.  Note  that  dereferencing  in  Polymorphic 
C  differs  from  dereferencing  in  Standard  ML,  where  if  e  :  r  ref,  then  \e  :  r. 

Polymorphic  C’s  types  are  stratified  into  three  levels.  There  are  the  ordinary 
r  (data  types)  and  a  (type  schemes)  type  levels  of  Damas  and  Milner’s  system 
[DaM82],  and  a  new  level  called  phrase  types  containing  a  types  and  variable 
types  of  the  form  r  var.  This  stratification  enforces  the  “second-class”  status  of 
variables:  for  example,  the  return  type  of  a  function  must  be  a  data  type,  so  that 
one  cannot  write  a  function  that  returns  a  variable.  On  the  other  hand,  pointer 
types  are  included  among  the  data  types,  making  pointers  first-class  values. 

Polymorphic  C  has  been  designed  to  ensure  that  function  calls  can  be  im¬ 
plemented  on  a  stack  without  the  use  of  static  links  or  displays.  In  traditional 
imperative  languages,  this  property  has  been  achieved  by  rigidly  fixing  the  syn¬ 
tactic  structure  of  programs.  For  example,  in  C,  functions  can  only  be  defined 
at  top  level.  But  such  syntactic  restrictions  are  often  complex  and  unnecessarily 
restrictive.  In  contrast,  Polymorphic  C  adopts  a  completely  free  syntax,  as  in 
core-ML.  The  ability  to  implement  Polymorphic  C  on  a  stack,  without  static 
links  or  displays,  is  achieved  by  imposing  one  key  restriction  on  lambda  abstrac¬ 
tions:  the  free  identifiers  of  any  lambda  abstraction  must  be  declared  at  top  level. 
Roughly  speaking,  a  top-level  declaration  is  one  whose  scope  extends  all  the  way 
to  the  end  of  the  program.  For  example,  in  the  program 

let  /  =  .  .  .  in 
letvar  x  :=  .  .  .  in 
letarr  a  [.  .  .]  in  /(.  .  .) 

the  identifiers  declared  at  top  level  are  /,  x,  and  a.  Although  they  are  severely 
restricted,  Polymorphic  C’s  anonymous  lambda  abstractions  are  convenient  at 
times.  For  example,  we  can  write  map(Xn.  n  +  1 ,  [4,  2 ,  5] )  without  having  to 
declare  a  named  successor  function.  Nevertheless,  one  might  prefer  a  different 
syntax  for  Polymorphic  C;  it  should  be  noted  that  there  would  be  no  obstacle 
to  adopting  a  more  C-like  syntax. 

We  use  ptr  rather  than  ref  to  avoid  confusion  with  C-| — |-  and  ML  references. 
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2.1  The  Issue  of  Type  Soundness  in  Polymorphic  C 

Much  effort  has  been  spent  trying  to  develop  sound  polymorphic  type  systems 
for  imperative  extensions  of  core-ML.  Especially  well-studied  is  the  problem 
of  typing  Standard  ML’s  first-class  references  [Tof90,  LeW91,  SML93,  Wri95]. 
The  problem  is  easier  in  a  language  with  variables  but  no  references,  such  as 
Edinburgh  LCF  ML,  but  subtle  problems  still  arise  [GMW79].  The  key  problem 
is  that  a  variable  can  escape  its  scope  via  a  lambda  abstraction  as  in 

letvar  stk  :=  []  in  Xv.  stk  :=  v  ::  stk 

In  this  case,  the  type  system  must  not  allow  type  variables  that  occur  in  the  type 
of  stk  to  be  generalized.  Different  mechanisms  have  been  proposed  for  dealing 
with  this  problem  [GMW79,  VoS95] 

In  the  context  of  Polymorphic  C,  however,  we  can  adopt  an  especially  simple 
approach.  Because  of  the  restriction  on  the  free  identifiers  of  lambda  abstrac¬ 
tions,  Polymorphic  C  does  not  allow  a  polymorphic  value  to  be  computed  in  an 
interesting  way;  for  example,  we  cannot  write  curried  functions.  For  this  reason, 
we  suffer  essentially  no  loss  of  language  expressiveness  by  limiting  polymorphism 
to  syntactic  values ,  that  is,  identifiers,  literals,  and  lambda  abstractions  [Tof90].4 

Limiting  polymorphism  to  syntactic  values  ensures  the  soundness  of  poly¬ 
morphic  generalizations,  but  pointers  present  new  problems  for  type  soundness. 
If  one  is  not  careful  in  formulating  the  semantics,  then  the  subject  reduction 
property  may  not  hold.  For  example,  if  a  program  can  dereference  a  pointer  to 
a  cell  that  has  been  deallocated  and  then  reallocated,  then  the  value  obtained 
may  have  the  wrong  type.  Our  semantics  is  designed  to  catch  all  pointer  errors. 

3  The  Polymorphic  C  Language 

The  syntax  of  Polymorphic  C  is  given  below.  For  the  sake  of  describing  the  type 
system,  we  need  to  distinguish  a  subset  of  the  expressions,  called  Values,  which 
are  the  syntactic  values  [Tof90,  Wri95]  of  the  language: 

(Expr )  e  ::=  v  \  e(e1,...,en)  |  ei  :=  e2  | 

fee  |  *e  |  ei  +  C'j  |  ei  [e2]  |  ei;e2  | 

while  e\  do  e2  | 

if  e\  then  e2  else  e 3  | 

let  x  =  e\  in  e2  | 

letvar  x  :=  e\  in  e2  | 

letarr  x[e  1]  in  e2  | 

(a,  1) 

(Values)  v  ::=  x  \  e  \  Xx\,...,xn.e  \  (a,  0) 

4  In  the  context  of  a  language  with  hrst-class  functions,  limiting  polymorphism  to 
syntactic  values  does  limit  the  expressiveness  of  the  language.  But  Wright  argues 
that  even  then  the  loss  of  expressiveness  is  not  a  problem  in  practice  [Wri95]. 


Meta-variable  x  ranges  over  identifiers,  c  over  literals  (such  as  integer  literals 
and  unit),  and  a  over  addresses.  All  free  identifiers  of  every  lambda  abstraction 
must  be  declared  at  top  level ;  this  restriction  can  be  precisely  defined  by  an 
attribute  grammar. 

The  expressions  (a,  1)  and  (a,  0)  are  variables  and  pointers,  respectively. 
These  will  not  actually  occur  in  user  programs;  they  are  included  in  the  lan¬ 
guage  solely  for  the  purpose  of  simplifying  the  semantics,  as  will  become  clear  in 
Section  3.2.  Notice  that  pointers  are  values,  but  variables  are  not;  this  reflects 
the  fact  that  variables  are  implicitly  dereferenced,  while  pointers  are  not. 

The  +  operator  here  denotes  only  pointer  arithmetic.  In  the  full  language,  + 
would  be  overloaded  to  denote  integer  addition  as  well. 

A  subtle  difference  between  C  and  Polymorphic  C  is  that  the  formal  parame¬ 
ters  of  a  Polymorphic  C  function  are  constants  rather  than  local  variables.  Hence 
the  C  function  f(x)  {  b  }is  equivalent  to 

let  /  =  Xx.  letvar  x  :=  x  in  b  in  .  .  . 

in  Polymorphic  C.  Also,  Polymorphic  C  cannot  directly  express  C’s  internal 
static  variables.  For  example,  the  C  declaration 

f(x)  {  static  int  n  =  0;  b  } 

corresponds  directly  to  the  Polymorphic  C  expression 

let  /  =  letvar  n  :=  0  in  Xx.  b  in  .  .  . 

but  this  violates  the  restriction  on  lambda  abstractions  if  n  is  free  in  b.  Such 
functions  must  be  transformed  to  eliminate  static  variables  in  favor  of  uniquely- 
renamed  global  variables: 

letvar  n  :=  0  in  let  /  =  Xx.  b  in  .  .  . 


3.1  The  Type  System  of  Polymorphic  C 

The  types  of  Polymorphic  C  are  stratified  as  follows. 

r  ::=  a  |  mt  \  unit  |  r  ptr  \  T\  x  •  •  •  x  rn  — >■  r  ( data  types) 
a  ::=  Va  .  a  |  r  ( type  schemes ) 

p  ::=  <7  |  r  var  ( phrase  types) 

Meta-variable  a  ranges  over  type  variables.  Compared  to  the  type  system  of 
Standard  ML,  all  type  variables  in  Polymorphic  C  are  imperative  [Tof90]. 

The  rules  of  the  type  system  are  formulated  as  they  are  in  Harper’s  system 
[Har94]  and  are  given  in  Figure  1. 5  It  is  a  deductive  proof  system  used  to  assign 
types  to  expressions.  Typing  judgements  have  the  form 


X;j\~  e  :  p 

5  For  brevity,  we  have  omitted  typing  rules  for  sequential  composition,  if,  and  while. 


(var-id) 

(ident) 

(ptr) 

(var) 

(lit) 

(—►-intro) 
(— >-elim) 

(let-val) 

(let-ord) 

(letvar) 

(letarr) 

(r-val) 

(l-val) 

(address) 

(assign) 

(arith) 

(subscript) 


A;  7  b  x  :  t  var 
A;  7  b  x  :  t 

A;  7  I-  ((*,  j),0)  :  t  ptr 

A; t  {{hi),  1)  :  T  var 

A;  7  b  c  :  int 
A;  7  b  unit  :  unit 


j{x)  =  r  var 
j{x)  >  r 

A(0  =  t 
A(0  =  t 

c  is  an  integer  literal 


A;  7 [a?i  :  n,  .  .  . ,  xn  :  rn]  b  e  :  r _ 

A;  7  b  Xxi , . .  ,,xn.e  :  ti  x  ■  ■  ■  x  rn  —>  t 


A;  7  b  e  :  tx  x  •  •  •  x  rn  — ►  r, 

A;  7  b  e8-  :  Tj-,  1  <  i  <  n 
A;  7  b  e(ei, . .  ,,en)  :  t 

A;  7  b  w  :  ti,  A;  j[x  :  Close a;7(ti)]  b  e  :  t2 
A ;  7  b  let  x  =  v  in  e  :  r2 

A;  7  b  ei  :  Ti ,  A;  7[t  :  77]  b  e2  :  r2 

A;  7  b  let  t  =  e\  in  e2  :  r2 

A;  7  b  ei  :  77,  A;  7(3;  :  n  var]  b  e2  :  r2 

A;  7  b  letvar  t  :=  ei  in  e2  :  r2 

A;  7  b  ei  :  int,  A;  7(2;  :  Ti  ptr]  b  e2  :  r2 
A;  7  b  letarr  x[e{\  in  e2  :  r2 


A; 7  b  e  :  r  var 


A;  7  b  e  : 

T 

A;  7  b  e  : 

r  ptr 

A;  7  b  *e 

:  r  var 

A;  7  b  e  : 

r  var 

A;  7  b  fee 

:  t  ptr 

A;  7  b 

ei  :  r  var , 

A;  7  b  e2 

:  r 

A;  7  b 

ei  :=  e2  :  1 

r 

A;  7  b 

e\  :  r  ptr, 

A;  7  b  e2  : 

mi 

A;  7  b 

ei  +  e2  :  r 

ptr 

A;  7  b 

e\  :  r  ptr, 

A;  7  b  e2  : 

int 

A;  7  b  ei  [e2]  :  r  var 


Fig.  1.  Rules  of  the  Type  System 


meaning  that  expression  e  has  type  p,  assuming  that  7  prescribes  phrase  types  for 
the  free  identifiers  of  e  and  A  prescribes  data  types  for  the  variables  and  pointers 
in  e.  More  precisely,  meta-variable  7  ranges  over  identifier  typings,  which  are 
finite  functions  mapping  identifiers  to  phrase  types;  j(x)  is  the  phrase  type 
assigned  to  x  by  7  and  j[x  :  p]  is  a  modified  identifier  typing  that  assigns  phrase 
type  p  to  x  and  assigns  phrase  type  j(xr)  to  any  identifier  x'  other  than  x. 

Meta-variable  A  ranges  over  address  typings,  which  are  needed  in  typing  the 
values  produced  by  programs.  One  might  expect  that  addresses  would  just  be 
natural  numbers,  but  that  would  not  allow  the  semantics  to  detect  invalid  pointer 
arithmetic.  So  instead  an  address  is  a  pair  of  natural  numbers  (i,j)  where  i  is 
the  segment  number  and  j  is  the  offset.  Intuitively,  we  put  each  variable  or  array 
into  its  own  segment.  Thus  a  simple  variable  has  address  ( i ,  0),  and  an  n-element 
array  has  addresses 

(*',  0),  (*,  1),  n  —  1) 

Pointer  arithmetic  involves  only  the  offset  of  an  address,  and  dereferencing 
nonexistent  or  dangling  pointers  is  detected  as  a  “segmentation  fault”.  An  ad¬ 
dress  typing  then  is  a  finite  function  mapping  segment  numbers  to  data  types. 
The  reason  it  does  not  map  addresses  to  data  types  is  that  nonexistent  pointers 
can  be  produced  as  values  of  programs,  and  such  pointers  must  therefore  be 
typable  if  subject  reduction  is  to  hold.  For  example,  the  program 

letarr  a[10]  in  a  +  17 

is  well  typed  and  evaluates  to  ((0,  17),  0),  a  nonexistent  pointer.  The  notational 
conventions  for  address  typings  are  similar  to  those  for  identifier  typings. 

The  generalization  of  a  data  type  r  relative  to  A  and  7,  written  Close\.n(T), 
is  the  type  scheme  Vd  .  r,  where  d  is  the  set  of  all  type  variables  occurring  free 
in  r  but  not  in  A  or  in  7.  We  write  A  b  e  :  r  and  Close\(T)  when  7  =  0.  We 
say  that  t'  is  a  generic  instance  of  Vd  .  r,  written  Vd  .  r  >  t' ,  if  there  exists  a 
substitution  S  with  domain  d  such  that  St  =  t'  .  We  extend  this  definition  to 
type  schemes  by  saying  that  <7  >  a'  if  <7  >  r  whenever  a'  >  r.  Finally,  we  say 
that  A;y  h  e  :  cr  if  A ;  7  H  e  :r  whenever  a  >  r. 

The  type  system  has  the  property  that  the  type  of  a  value  determines  the 
form  of  the  value;  also,  an  expression  of  type  r  var  can  have  only  two  possible 
forms: 

Lemma  1  (Correct  Form).  Suppose  A  b  v  :  r.  Then 

—  if  t  is  int,  then  v  is  an  integer  literal, 

—  if  t  is  unit,  then  v  is  unit, 

—  if  t  is  t'  ptr,  then  v  is  of  the  form  ((i,j),  0),  and 

—  if  t  is  T\  x  •••  x  Tn  — >■  Tn+i,  then  v  is  of  the  form  Xxi ,  .  .  . ,  xn.e. 

Furthermore,  if  A  b  e  :  r  var,  then  e  is  of  the  form  ((i,j),  1)  or  of  the  form  *e'.6 
Proof.  Immediate  from  inspection  of  the  typing  rules.  □ 

6  Note  that  this  assumes  that  array  subscripting  is  syntactic  sugar. 


A  consequence  of  the  last  part  of  this  lemma  is  that  if  A  b  e  :  r  and  e  is  not  of  the 
form  ((i,j),  1)  or  *e/,  then  derivation  of  the  typing  judgement  cannot  end  with 
rule  (r-VAl).  So  the  typing  rules,  for  the  most  part,  remain  syntax  directed.  The 
fact  that  variables  can  have  only  two  possible  forms  is  exploited  in  our  structured 
operational  semantics,  specifically  within  rules  (ref)  and  (update). 

3.2  The  Semantics  of  Polymorphic  C 

We  give  a  structured  operational  semantics.  A  closed  expression  is  evaluated 
relative  to  a  memory  fi,  which  is  a  finite  function  from  addresses  to  values. 
It  may  also  map  an  address  to  dead  or  uninit,  indicating  that  the  cell  with 
that  address  has  been  deallocated  or  is  uninitialized.  The  contents  of  an  address 
a  G  dom(ja)  is  the  value  and  we  write  fi[a  :=  i>]  for  the  memory  that  assigns 

value  v  to  address  a,  and  value  n(ar)  to  an  address  a'  a;  fi[a  :=  i>]  is  an  update 
of  fi  if  a  G  dom(ja)  and  an  extension  of  fi  if  a  ^  dom(ja). 

The  evaluation  rules  are  given  in  Figure  2.  They  allow  us  to  derive  judgements 
of  the  form 

which  asserts  that  evaluating  closed  expression  e  in  memory  fi  results  in  value  v 
and  new  memory  p! . 

We  write  [e' / x\e  to  denote  the  capture-avoiding  substitution  of  e'  for  all  free 
occurrences  of  x  in  e.  Note  the  use  of  substitution  in  rules  (apply),  (bind), 
(bindvar),  and  (bindarr).  It  allows  us  to  avoid  environments  and  closures  in 
the  semantics,  so  that  the  result  of  evaluating  a  Polymorphic  C  expression  is 
just  another  expression  in  Polymorphic  C.  This  is  made  possible  by  the  flexible 
syntax  of  the  language  and  the  fact  that  all  expressions  are  closed,  including 
lambda  abstractions. 

4  Semantic  Soundness 

In  this  section,  we  establish  the  soundness  of  our  type  system.  We  begin  by 
using  the  framework  of  Harper  [Har94]  to  show  subject  reduction,  which  basically 
asserts  that  if  b  e  :  r  and  h  e  =>  v,n' ,  then  b  v  :  r.  But  since  e  can  allocate 
addresses  and  they  can  occur  in  v,  the  conclusion  must  actually  be  that  there 
exists  an  address  typing  A'  such  that  A'  b  v  :  r  and  such  that  p!  :  A'.  The  latter 
condition  asserts  that  A'  is  consistent  with  p! .  More  precisely,  we  say  /a  :  A  if 

1.  dom( A)  =  {i  |  (*",  0)  G  dom(ja)},  and 

2.  for  all  (i,j),  A  b  n((i,j  j)  :  A(i)  if  n((i,j  j)  is  a  value. 

Note  that  A  must  give  a  type  to  uninitialized  and  dead  addresses  of  fi,  but  the 
type  can  be  anything. 

Before  giving  the  subject  reduction  theorem,  we  require  a  number  of  lem¬ 
mas  that  establish  some  useful  properties  of  the  type  system.  We  begin  with  a 
fundamental  type  substitution  lemma: 


(val) 

(contents) 

(deref) 

(ref) 

(offset) 

(apply) 


(update) 


(bind) 


(bindvar) 


(bindarr) 


fi  b  v  =7  v ,  fi 

a  G  dom(fi)  and  fi(a)  =  v 
[i  b  (a,  1)  =7  v,  fi 

/ihe=>(a,  0),  fi1 
a  G  domin')  and  Ai'(a)  =  v 

n  \-  *e  v ,  n' 

n  b  &(a,  1)  =7  (a,  0),  n 

/ihe=>(a,  0),  n' 
n  b  &  *  e  =7  (a,  0),  n' 

H  I-  ei  =7  ((*',  i),  0),  /^i 

Hi  b  e2  =7  n,  fi'  (n  an  integer) 

n  b  ei  +  e2  =7  ((*',  j  +  n),  0),//' 

H  b  e  =7  A*i ,  .  .  . ,  xn.  e' ,  ni 
Hi  \~  ei  =7  vi,H2 

Hn  ^  Pni/In  +  l 

Hn  + 1  l~  [«!,•••,  Pn/^1,  .  .  .  ,  Tn]e'  =7  V,  fi' 

H  b  e(ei,  .  .  ,,e„)  =7  v,// 

fi  b  e  =7  w,  fj! 

a  G  domin')  and  //(a)  7^  dead 
fi  b  (a,  1)  :=  e  =7  i>,  //'[a  :=  i>] 

H  I-  ei  =>■  (a,0),//i 
A*1  |-  e2  V,H2 

a  G  dom(/i 2)  and  /12(a)  7^  dead 
A*  b  *ei  :=  e2  =7  v,  /i2[a  :=  v] 

H  I-  ei  =>■  ^1  ;  A*i 

A*i  l~  [ui/a?]e2  =7  p2,/»2 _ 

fi  b  let  t  =  ei  in  e2  =t  V2,H2 

H  I-  ei  =>■  IT ,  A*i 
(i,  0)  ^  dom(ji  1) 

A*i[(»,  0)  :=  it]  l~  [((»,  0),  l)/a?]e2  =>•  p2,/»2 _ 

fi  b  letvar  t  :=  ei  in  e2  =7  p2,  A*2[(i,  0)  :=  dead] 

H  b  ei  =7  n,  Hi  (n  a  positive  integer) 

(i,  0)  ^  dom(ji  1) 

A«i[(i,  0),  .  .  . ,  (i,  n  —  1)  :=  uninit,  .  .  . ,  uninit]  b 

_ [((?,  0),  0)/a?]e2  =7  P2,j»2 _ 

fi  b  letarr  x[e{\  in  e2  =7 

v2,  A*2  [(*3  0),-.-,(*,Ti  —  1)  :=  dead,  .  .  . ,  dead] 


Fig.  2.  The  Evaluation  Rules 


Lemma  2  (Type  Substitution).  //A;  7  b  e  :  r,  then  for  any  substitution  S, 
S A;  S 7  b  e  :  SY,  and  the  latter  typing  has  a  derivation  no  longer  than  the  former. 

Lemma  3  (Superfluousness).  Suppose  that  A;  7  b  e  :  r.  If  i  dom(X),  then 
A[i  :  r'];  7  b  e  :  r,  and  if  x  dom( 7),  then  A;  7(3;  :  p]  b  e  :  r. 

Lemma  4  (Substitution).  If  X;y  \~  v  :  cr  and  A;  j[x  :  a]  b  e  :  t,  then  A;  7  b 
e  :  r.  A/so  if  A;  7  b  (a,  1)  :  r  var  and  A;  7(3?  :  r  i)ar]  b  e  :  r',  then  A;  7  b 
[(a,  l)/*]e  :  rb 

The  preceding  lemma  does  not  hold  for  arbitrary  expression  substitution. 

Lemma  5  (V-intro).  If  A;  7  b  e  :  r  and  ex\ ,  .  .  . ,  a„  do  not  occur  free  in  X  or  in 
7,  then  A;  7  b  e  :  Vai,  .  .  . ,  a„  .  t . 


We  can  now  give  the  subject  reduction  theorem: 

Theorem  6  (Subject  Reduction).  If  /a  b  e  =7  v,  p! ,  X  b  e  :  r,  and  fi  :  A,  then 
there  exists  X'  such  that  X  C  A',  //  :  A',  and  A'  b  v  :  t. 


Proof.  By  induction  on  the  structure  of  the  derivation  of  fi  b  e  =7  v,  p! .  Here  we 
just  show  the  (bindvar)  and  (bind)  cases. 

(bindvar).  The  evaluation  must  end  with 

A*  b  ei  =>•  vi,ni 

( i ,  0)  dom(jii ) 

A*i[(»,0)  :=  vi]  b  [((»,0),  1)/x]e2  =>  V2,S2 _ 

fi  b  letvar  x  :=  e\  in  e 2  =>  W2;At2[(®;0)  :=  dead] 

while  the  typing  must  end  with  (letvar): 

A  b  e\  :  Ti 

A;  [*  :  Ti  i;ar]  b  e2  :  T2 
A  b  letvar  *  :=  e\  in  e 2  :  T2 


and  fi  :  A.  By  induction,  there  exists  Ai  such  that  A  C  Ai,  pi\  :  Ai,  and  Aj  b  «i  : 
T\.  Since  :  Ai  and  (*",  0)  dom(ja  1),  also  i  dom(Xi).  So  Ai  C  Ai[i  :  Ti].  By 
rule  (var), 

Ai [*’  :  ri]  b  ((i,  0),  1)  :  n  var 


and  by  Lemma  3, 


Ai [i  :  ri];  [x  :  ri  var]  b  e2  :  r2 


So  we  can  apply  Lemma  4  to  get 


Ai[*  :  ti]  b  [((i,  0),  l)/*]e2  :  r2 

Also,  Hi  [(i,  0)  :=  r>i]  :  Ai[i  :  T\],  So  by  a  second  use  of  induction,  there  exists  A' 
such  that  Ai[i  :  T\]  C  A',  /i2  :  A',  and  A'  b  r2  :  r2. 

It  only  remains  to  show  that  /u2[(i,  0)  :=  dead]  :  A'.  But  this  follows  imme¬ 
diately  from  fi2  :  Ab 


Remark.  What  would  go  wrong  if  we  simply  removed  the  deallocated  address 
(i,  0)  from  the  domain  of  the  final  memory,  rather  than  marking  it  dead?  Well, 
with  the  current  definition  of  fi  :  A,  we  would  then  be  forced  to  remove  i  from 
the  final  address  typing.  But  then  fi2  —  i  :  A'  —  i  would  fail,  if  there  were  any 
dangling  pointers  ((*,  j),0)  in  the  range  of  H2  —  *•  If,  instead,  we  allowed  A'  to 
retain  the  typing  for  i,  then  the  next  time  that  ( i ,  0)  were  allocated  we  would 
have  to  change  the  typing  for  i,  rather  than  extend  the  address  typing. 

(bind).  If  e\  is  a  value  iq,  then  the  evaluation  must  end  with 

fl\~  V l  =>«!,/! 

(a  b  [vi/x\e2  =>•  V2,  /»' _ 

fi  b  let  x  =  Vi  in  e2  =>  ^2,  /a' 

while  the  typing  must  end  with  (let-VAl): 

A  b  i>i  :  Ti 

A;  [x  :  Close \(t±)\  b  e2  :  r2 

A  b  let  r  =  «i  in  e2  :  T2 

and  fi  :  A.  By  Lemma  5,  A  b  »i  :  Close\(Ti),  and  so  by  Lemma  4,  A  b  [v\/x\e2  : 
t 2 .  So  by  induction,  there  exists  A'  such  that  A  C  A',  /  :  A',  and  A'  b  i>2  :  rg. 

The  case  when  e\  is  not  a  value  is  similar,  but  Lemma  5  is  not  required,  and 
induction  is  used  twice.  □ 

The  subject  reduction  property  does  not  by  itself  ensure  that  a  type  system  is 
sensible.  For  example,  a  type  system  that  assigns  every  type  to  every  expression 
trivially  satisfies  the  subject  reduction  property,  even  though  such  a  type  system 
is  useless.  The  main  limitation  of  subject  reduction  is  that  it  only  applies  to  well- 
typed  expressions  that  evaluate  successfully.  Really  we  would  like  to  be  able  say 
something  about  what  happens  when  we  attempt  to  evaluate  an  arbitrary  well- 
typed  expression. 

One  approach  to  strengthening  subject  reduction  (used  by  Gunter  [Gun92], 
for  example)  is  to  augment  the  evaluation  rules  with  rules  specifying  that  cer¬ 
tain  expressions  evaluate  to  a  special  value,  TypeError,  which  has  no  type. 
For  example,  an  attempt  to  dereference  a  value  other  than  a  pointer  would 
evaluate  to  TypeError.  Then,  by  showing  that  subject  reduction  holds  for  the 
augmented  evaluation  rules,  we  get  that  a  well-typed  expression  cannot  evalu¬ 
ate  to  TypeError.  Hence  any  of  the  errors  that  lead  to  TypeError  cannot 
occur  in  the  evaluation  of  a  well-typed  expression.  Aside  from  the  drawback  of 
requiring  us  to  augment  the  evaluation  rules,  this  approach  does  not  give  us  as 
much  information  as  we  would  like.  It  tells  us  that  certain  bad  things  will  not 
happen  during  the  evaluation  of  well-typed  expression,  but  says  nothing  about 
what  other  bad  things  can  happen. 

We  now  present  a  different  approach  leading  to  a  type  soundness  theorem 
that  characterizes  precisely  everything  that  may  go  wrong  when  we  attempt 
to  evaluate  a  well-typed  expression.  First,  we  note  that  a  successful  evaluation 
always  produces  a  value : 


Lemma  7.  If  /a  he=>  v,)E ,  then  v  is  a  value  and  p!  is  a  memory. 

Roughly  speaking,  the  combination  of  the  subject  reduction  theorem  and  the 
correct  forms  lemma  (Lemma  1)  allows  us  to  characterize  the  forms  of  expres¬ 
sions  that  will  be  encountered  during  the  evaluation  of  a  well-typed  expression. 
This  will  allow  us  to  characterize  what  can  go  wrong  during  the  evaluation. 

To  get  a  handle  on  the  “progress”  of  an  attempted  evaluation,  it  is  helpful  to 
recast  the  evaluation  rules  as  a  recursive  evaluation  function,  eval.  For  example, 
the  (update)  rules  correspond  to  the  clauses 

eval(n,  (a,  1)  :=  e)  = 

let  ( v,fi ')  =  eval(p,e)  in 

if  a  G  dom(ja')  and  p'(a)  dead  then 
(v,n'[a  :=  v]) 
else 
fail; 

eval(n,  *ei  :=  02)  = 

let  ((a,0),/^i)  =  eval(jj,,e  1)  in 
let  ( v,/i2 )  =  eval(jj,i,  02)  in 

if  a  G  dom(p 2)  and  10-2(01)  y^  dead  then 
(v,H2[a  :=  «]) 
else 

fail; 

Introducing  eval  allows  us  to  talk  about  type  soundness  in  terms  of  what  happens 
when  eval  is  called  on  a  well-typed  program. 

Definition8.  A  call  eval(p,e)  is  well  typed  iff  there  exist  A  and  r  such  that 
fi  :  A  and  A  b  e  :  r. 

Definition  9.  An  activation  of  eval  aborts  directly  if  the  activation  itself  aborts. 
Note  that  an  activation  does  not  abort  directly  if  it  makes  a  recursive  call  that 
aborts  or  does  not  terminate. 

We  can  now  show  the  key  result  for  type  soundness: 

Theorem  10.  Suppose  that  an  activation  eval(p,e)  is  well  typed.  Then  every 
recursive  call  made  by  the  activation  is  well  typed.  Furthermore,  if  the  activation 
aborts  directly,  it  aborts  due  to  one  of  the  following  errors: 

El.  An  attempt  to  read  or  write  to  a  dead  address  (i,j). 

E2.  An  attempt  to  read  or  write  to  a  nonexistent  address  (i,j).  Address  (*",  0) 
always  will  exist,  so  the  problem  is  that  the  offset  j  is  invalid. 

E3.  An  attempt  to  read  an  uninitialized  address  (i,j). 

EJt.  An  attempt  to  declare  an  array  of  size  less  than  or  equal  to  0. 


Proof.  We  just  consider  all  possible  forms  of  expression  e.  Here  we  just  give  the 
case  e\  :=  e 2;  the  other  cases  are  quite  similar. 

If  evalln,  e\  :=  62)  is  well  typed,  then  there  exist  A  and  r  such  that  fi  :  A  and 
A  b  e\  :=  €2  :  r.  The  latter  typing  must  be  by  (assign): 

A  b  e\  :  t  var 

A  h  e2  :  t _ 

A  b  e\  :=  t2  :  t 

By  Lemma  1,  e\  must  be  of  the  form  ((i,j),l)  or  else  of  the  form  *0).  So, 
simplifying  notation  a  bit,  we  are  left  with  two  cases:  (a,  1)  :=  e  and  *ei  :=  62- 
Note  that  there  is  a  clause  of  eval  that  applies  to  each  of  these.  We  consider  the 
two  cases  in  turn. 

If  the  activation  is  evalln,  (a,  1)  :=  e),  where  fi  :  A  and  A  b  (a,  1)  :=  e  :  r, 
then  the  typing  must  end  with  (assign): 

A  h  (a, 1)  :  t  var 

Ah  e  :  t 

A  h  (a,  1)  :=  e  :  t 
So  by  (var),  A(i)  =  r,  where  a  = 

Also,  the  recursive  call  evalln, e)  is  well  typed.  If  this  call  fails  to  return, 
then  the  parent  activation  evalln,  (a,  1)  :=  e)  doesn’t  abort  directly.  If  the  call 
succeeds,  then  by  Lemma  7  it  returns  a  value  v  and  a  memory  //,  so  the  pattern- 
match  ‘let  ( v,fi ')  =  evalln, ef  doesn’t  abort. 

By  the  subject  reduction  theorem,  there  exists  A'  such  that  A  C  A',  :  A', 

and  A'  b  v  :  r.  Hence  A ' (i)  =  r,  and  so  ( i ,  0)  E  dom(n'). 

So  the  only  way  for  the  activation  eval(ji,  (a,  1)  :=  e)  to  abort  directly  is  if 
(i,  j)  domin')  or  IJ-'iifj))  =  dead.  And  since  (*",  0)  E  domin'),  we  know  that 
if  the  first  case  holds,  the  error  is  in  the  offset  j. 

If  the  activation  is  evalln,  *e%  :=  62),  where  n  '■  A  and  A  b  *ei  :=  02  '■  t,  then 
the  typing  must  end  with  (l-VAl)  followed  by  (assign): 

A  b  e\  :  r  pir 
A  b  *ei  :  r  var 

A  b  e2  :  r _ 

A  b  *ei  :=  62  :  r 

So  the  recursive  call  evalln,  ei)  is  weH  typed.  If  this  call  fails  to  return,  then  the 
parent  activation  evalln,  *e  1  :=  e2)  doesn’t  abort  directly.  If  the  call  succeeds, 
then  by  Lemma  7  it  returns  a  value  iq  and  a  memory  ni- 

By  the  subject  reduction  theorem,  there  exists  Ai  such  that  A  C  Ai,  /ij  :  Ai, 
and  Ai  b  i)i  :  r  ptr.  So  by  the  Correct  Form  lemma,  iq  is  of  the  form  ((i,j),  0) 
hence  the  pattern-match  ‘let  ((a,  0) ,  /^i)  =  evalln,  e  1)’  doesn’t  abort.  Also,  by 
(ptr),  Ai (*)  =  r. 

By  the  Superfluousness  Lemma,  Ai  b  e2  :  r,  so  the  recursive  call  evallni,  02) 
is  also  well  typed.  If  this  call  fails  to  return,  then  the  parent  activation  doesn’t 


get  stuck.  If  it  succeeds,  then  it  returns  a  value  v  and  a  memory  ^2,  so  the 
pattern-match  ‘let  (v,  fi 2)  =  eval(ni ,  62)’  doesn’t  abort.  By  the  subject  reduction 
theorem,  there  exists  A'  such  that  Ai  C  A',  fi 2  :  A',  and  A'  b  v  :  r.  Hence  A ' (i)  =  r, 
and  so  (*",  0)  E  dom(ja 2). 

So  the  only  way  for  the  activation  eval(fj,,*e  1  :=  62)  to  abort  directly  is  if 
(i,j)  dom(ja 2)  or  H2 ((*,  j))  =  dead.  And  since  (*,  0)  E  dom(ja 2),  we  know  that 
if  the  first  case  holds,  the  error  is  in  the  offset  j.  □ 

Corollary  11  (Type  Soundness).  If  A  b  e  :  r  and  fi  :  A,  then  eval(ja,e)  either 

1.  succeeds  (producing  a  value  of  type  t),  or 

2.  fails  to  halt,  or 

3.  aborts  due  to  one  of  the  errors  El,  E2,  E3,  or  Ej. 

Proof  Any  call  must  either  succeed,  fail  to  halt,  or  abort. 

If  the  call  aborts,  then  one  of  its  recursive  activations  must  abort  directly. 
Now  this  activation  must  have  been  reached  by  a  finite  path  of  recursive  calls 
from  the  root  call  eval(ja,e).  Since  the  root  call  is  well  typed,  by  Theorem  10 
all  the  calls  on  the  path  are  well  typed.  So  the  activation  that  aborts  directly  is 
well  typed.  Hence  by  Theorem  10  it  aborts  due  to  one  of  the  errors  El-Ej.  □ 

5  Discussion 

The  semantics  specifies  that  an  implementation  is  under  no  obligation  to  preserve 
the  contents  of  variables  beyond  their  scope,  which  in  turn  justifies  a  stack-based 
implementation.  Further,  there  is  no  need  for  static  links  since  all  functions 
in  Polymorphic  C  are  closed  with  respect  to  top-level  declarations.  It  is  also 
interesting  to  note  that  in  light  of  this  closure  property,  there  would  be  no  need 
to  specify  in  the  semantics  that  a  variable  dies  at  the  end  of  its  scope  if  there 
were  no  &  operator.  The  variable  would  simply  be  unreachable  in  this  case. 

To  maintain  subject  reduction,  the  semantics  also  ensures  that  any  program 
with  pointer  errors  does  not  produce  a  value.  This  requires  a  number  of  mecha¬ 
nisms,  for  example,  keeping  track  of  cells  that  have  been  deallocated,  that  we  do 
not  expect  to  see  in  any  realistic  implementation  of  the  semantics.  We  believe 
that  an  implementation,  for  the  sake  of  efficiency,  should  be  able  to  do  whatever 
it  likes  on  programs  that  do  not  yield  values,  and  hence  are  in  error,  accord¬ 
ing  to  the  semantics.  For  example,  the  semantics  does  not  prescribe  a  value  for 
dereferencing  a  dangling  pointer.  So  it  would  be  acceptable,  upon  an  attempt 
to  dereference  such  a  pointer,  for  an  implementation  to  merely  return  the  last 
value  stored  there,  as  in  C,  rather  than  detect  an  error. 

Given  that  a  real  implementation  would  not  catch  pointer  errors,  what  then  is 
the  practical  significance  of  our  type  soundness  theorem?  Two  things  can  be  said. 
First,  the  theorem  gives  a  characterization  of  the  source  of  errors — it  tells  us  that 
when  a  program  crashes  with  a  “Segmentation  fault — core  dumped”  message, 
what  causes  the  crash  is  one  of  the  errors  El-Ej  and  not,  for  example,  an  invalid 
polymorphic  generalization.  Second,  by  directly  implementing  our  semantics,  one 
can  get  a  robust  “debugging”  implementation  that  flags  all  pointer  errors. 


6  Conclusion 


Advanced  polymorphic  type  systems  have  come  to  play  a  central  role  in  the 
world  of  functional  programming,  but  so  far  have  had  little  impact  on  traditional 
imperative  programming.  We  assert  that  an  ML-st.yle  polymorphic  type  system 
can  be  applied  fruitfully  to  a  “real-world”  language  like  C,  bringing  to  it  both 
the  expressiveness  of  polymorphism  as  well  as  a  rigorous  characterization  of  the 
behavior  of  well-typed  programs. 

Future  work  on  Polymorphic  C  includes  the  development  of  a  type  inference 
algorithm  (preliminary  work  indicates  that  this  can  be  done  straightforwardly), 
the  development  of  an  efficient  implementation  (perhaps  using  the  work  of  [Le92, 
ShA95,  Ha.M95]),  and  extending  the  language  to  include  other  features  of  C, 
especially  structures. 
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